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O . Abstract 

(N 

; I ' The Renyi entropy is a generalization of the Shannon entropy and is widely used in math- 

ematical statistics and applied sciences for quantifying the uncertainty in a probability distri- 
bution. We consider estimation of the quadratic Renyi entropy and related functionals for the 
marginal distribution of a stationary m-dependent sequence. The {/-statistic estimators under 
study are based on the number of e-close vector observations in the corresponding sample. A 
variety of asymptotic properties for these estimators are obtained (e.g., consistency, asymptotic 
normality, Poisson convergence). The results can be used in diverse statistical and computer 
science problems whenever the conventional independence assumption is too strong (e.g., e-keys 
^ (— | in time series databases, distribution identification problems for dependent samples). 
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1 Introduction 



Entropy is applied in information theory and statistics for characterizing the diversity or uncertainty 



in a probability distribution. For a continuous distribution V with density p(x),x £ R d , the Renyi 
entropy is defined (Renyi, 1970) as 



h 'W ■= lo § (J P( X ) S dx \ s^l. 

Henceforth we use logx to denote the natural logarithm of x. The Renyi entropy h s is a general- 
ization of the Shannon entropy (Shannon, 1948): 

h\(V) = limh s (V) = — p(x) logp(x) dx. 

8->l J R d 

From the statistical point of view, the quadratic Renyi entropy 

h2 = h2(V) = — log (J p(x) 2 dx 
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is the simplest point on the Renyi spectrum {h s (V),s £ S}, where S is subset of R = R 1 , such 
that entropies exist. Note that /12 = — log((/2)> where we assume that the quadratic functional 



JR d 

is well defined, and hence the point s = 2 belongs to the set S. More entropy generalizations are 
known in information theory, e.g., the Tsallis entropy (Tsallis, 1988): 



The Renyi entropy (or information) for stationary processes can be understood as that of the 
corresponding ergodic or marginal distributions, see, e.g., Gregorio and Iacus (2003), where the 
Renyi entropy is computed for a large class of ergodic diffusion processes. 

Numerous applications of the Renyi entropy in information theoretic learning, statistics (e.g., 
classification, distribution identification problems, statistical inference), computer science (e.g., 
average case analysis for random databases, pattern recognition, image matching), and econometrics 
are discussed, e.g., in Principe (2010), Kapur (1989), Kapur and Kesavan (1992), Pardo (2006), 
Escolano et al. (2009), Neemuchwala et al. (2005), Ullah (1996), Baryshnikov et al. (2009), Seleznjev 
and Thalheim (2003, 2010), Thalheim (2000), Leonenko et al. (2008), and Leonenko and Seleznjev 
(2010). 

Various estimators for the quadratic functional qi and the entropy /12 for independent samples 
have been studied. Leonenko et al. (2008) obtain consistency of nearest-neighbor estimators for 
/12, see also Penrose and Yukich (2011) and the references therein. Bickel and Ritov (1988) and 
Gine and Nickl (2008) show rate optimality, efficiency, and asymptotic normality of kernel-based 
estimators for 52 m the one-dimensional case. Laurent (1996) builds an efficient and asymptotically 
normal estimator of q<i (and more general functionals) for multidimensional distributions using 
orthogonal projection. See also references in these papers for more studies under the independence 
assumption. 

In our paper, we study [/-statistic estimators for q2 and /i2 based on the number of e-close 
vector observations (or the number of small inter-point distances) in a sample from a stationary m- 
dependent sequence with marginal distribution V. This extends further the results and approach in 
Leonenko and Seleznjev (2010) (see also Kallberg and Seleznjev, 2012), where the same estimators 
are studied under independence. The number of small inter-point distances in an independent 
sample exhibits rich asymptotic behaviors, including, e.g., Poisson limits and asymptotic normality 
(see Jammalammadaka and Janson, 1986, and references therein). We show that some of the 
established limit results for this statistic are still valid when the sample is from a stationary m- 
dependent sequence. It should be noted that our normal limit theorems do not follow from the 
general theory developed for degenerate variable [/-statistics under dependence, see, e.g., Kim et 
al., 2011, and references therein. 

Note that the class of stationary m-dependent processes is quite large, see, e.g., the book of 
Joe (1997), where there are numerous copula constructions for m-dependent sequences with given 
marginal distribution, or Harrelson and Houdre (2003), where the class of stationary m-dependent 
infinitely divisible sequences is studied. 
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First we introduce some notation. Throughout this paper, it is assumed that the sequence 
{Xi}^ =1 of random d-vectors is strictly stationary and m-dependent, i.e., {Xb,Xb + i . . . ,Xi> +s } and 
{X _ r ,X _ r+ i, . . . ,X a } are independent sets of vectors when b — a > m. Let V be the (marginal) 
distribution of X t with density p(x),x S R d ,p(x) G L2(R d ), and entropy h2(V). We write d(x, y) := 
| \x — y\ | for the Euclidean distance in R d and define B e (x) := {y : d(x, y) < e} to be an e-ball in R d 
with center at x and radius e. Denote by b t (d) := e d bi(d),bi(d) = 2ir d ^ 2 /(dT(d/2)), the volume of 
the e-ball. Let X and Y be independent and with distribution V and introduce the e-ball probability 
as 

Px ,e{x) := P{X e B e {x)}. 

Two vectors x and y are said to be e-close if d(x, y) < e, for some e > 0. The e-coincidence 
probability for independent vectors is written q2 <e := P(d(X,Y) < e) = F,px,e(Y). Then the Renyi 
e-entropy /i2,e('P) := — log QvCP) can be used as a measure of uncertainty in V (see Seleznjev and 
Thalheim, 2008, Leonenko and Seleznjev, 2010). In what follows, let e = e(n) — > as n — > 00. 
Denote by \C\ the cardinality of the finite set C and let N n be the random number of e-close 
observations in the sample X\ , . . . , X n , 

N n = N n , € := IKX^X,) < e, i,j = 1, ...,n, i < j)}\ =J2 I W X i, X j) < =: (2)^' 

where 1(D) is the indicator of an event D. Then Q n is a [/-statistic of Hoeffding with varying 
kernel. For a short introduction to [/-statistics techniques, see, e.g., Serfling (2002), Koroljuk and 
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Borovskich (1994), Lee (1990). Denote by — > and — > convergence in distribution and in probability, 
respectively. For a sequence of random variables U n ,n > 1, we write U n = Op(l) as n — > 00 if for 
any 5 > and large enough n > 1, there exists C > such that P(\U n \ > C) < 5. Moreover, for a 
numerical sequence v n ,n > 1, let U n = Op(v n ) as n — > 00 if U n /v n = Op(l) as n — >• 00. 

The developed technique can also be used for estimation of the corresponding entropy-type 
characteristics for discrete distributions (see, e.g., Leonenko and Seleznjev, 2010) and stationary 
m-dependent sequences. In this case, the applied estimator is a [/-statistic with fixed kernel and 
so the problem is simplified in the way that some already established general results yield the 
limit properties, including consistency and asymptotic normality (see Appendix). An approach to 
statistical estimation of the Shannon entropy for discrete stationary m-dependent sequences can 
be found in Vatutin and Mikhailov (1995). 

The remaining part of the paper is organized as follows. In Section [21 the main results for the 
number of small inter-point distances N n and the estimators of (72 and /12 are presented. Numerical 
experiments illustrate the rate of convergence in the obtained asymptotic results. In Section [31 we 
discuss applications of these results to e-keys in time series databases and distribution identification 
problems for dependent samples. Section H] contains the proofs of the statements in Section[2l Some 
asymptotic properties of entropy estimation for the discrete case are given in Appendix. 

2 Main results 

We formulate the following assumption about finite dimensional distributions of the stationary 
sequence {Xi}. 
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A. The marginal density fulfills p(x) E L 3 (R d ). Moreover, for each 4-tuple of distinct positive 
integers t = (ti, t 2 , t 3 , £4), the distribution of the random vector (X tl , X t2 , X t3 , X t4 ) has a 
density pt(x\, x 2 , x 3 , X4) in i? 4d that satisfies 

gt(x 1 ,x 2 ) := (J ^Pt{xi,x 2 ,x 3 ,X4) 2 dx 3 dxA E L\{R 2d ). (1) 

Remark 1. (i) The integrability (JTJ) ensures that the dependence among {X{\ is weak enough. In 
fact, provided p(x) E L 2 (R d ), it holds for an independent sequence, so assumption A is a general- 
ization of the condition p(x) E L 3 (R d ) used for studying the same estimators under independence 
(Kallberg and Seleznjev, 2012). 

(ii) If the density Pt(xi, x 2 , x 3 , X4) is bounded for each distinct t = (ti, t 2 , t 3 , £4), the following 
condition is sufficient for „4: for each distinct pair (ti,t 2 ), let the density p tlt t 2 ( x ,y) of (A\, A^ 2 ) 
satisfy 

p tl ,t 2 (x,y) E L 1/2 (R 2d ). 



Let in the following examples {Zj}~_ 00 be a sequence of independent identically distributed (i.i.d.) 
normal iV(0, l)-random variables. 

Example 1. (i) Assumption A holds for all vector Gaussian sequences {Xi}. In particular, it is 
satisfied for the m-dependent moving average time series MA(ra) generated by {Z{\, i.e., 

Xt = OoZ t + • • • + 9 m Z t - m , t > 1. (2) 

(ii) An exponential transformation of time series ([2]) gives a non-linear sequence 

X t = exp(9 Z t H h 9 m Z t - m ), t > 1. 

The finite dimensional distributions of {Xi} are the multivariate log-normal distributions (see, e.g., 
Kotz et al., 2000) and thus A is fulfilled in this case. 

(iii) Let X t = Z t /Z t +i,t > 1. Then {Xi} is a stationary 1-dependent sequence, say, a Cauchy 
sequence. It can be shown that for t = (1, 2, 3, 4), 

9t(xi,X 2 ) = C (1 + a ,2^ 2 |.2 a .2)5/4 £ L ^ R2 ^ C>0 ' 

that is (pQ) is valid in this case and similarly for other t. Since also p(x) E L 3 (R), condition A is 
satisfied. 

2.1 Asymptotic distribution of the number of small inter-point distances 

Let the expectation and variance of the number of small inter-point distances X n be {j> n — /^n,e ■ — 
EA^ n and = e := Var(A r n ), respectively. For h = 0,1,..., we introduce the characteristic 

a l,h,e := Cov (p X ,e{Xl),px,e(X 1+h )). Let 

1 / n \ m 

Ci, m := lim -Var = Var(p(Xi)) + 2 V Cov(p(Xi),p(Xi +7l )). (3) 

\i=l / h=l 
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Proposition 1 Suppose that A holds. 



(i) Then the expectation and variance of N n fulfill 



= I " )q 2 ,e + o{ne d/2 ), 



°l = y92,e + n 3 |a? Ae + 2 °i,h,*j + °( ned/2 ) + ^ 2 e d ) asn^oo. 

(ii) If n 2 e d — > a, < a < oo, and Ci,m > when sup n>1 {ne rf } = oo, then 

~ -bi(d)q 2 n 2 e d , 
<?n ~ ]jbi(d)q 2 n 2 e d + h(d) 2 Ci, m n 3 e 2d as n -> oo. 

The asymptotic distribution for N n depends on the rate of decrease of e(n). Some results for 
N n under the i.i.d. assumption (i.e., m = 0) are obtained in Jammalamadaka and Janson (1986) 
(see also Leonenko and Seleznjev, 2010). With only additional weak conditions, we show that these 
results are still valid when {Xi} is stationary and m-dependent. Let \i = /i(a) := ^bi(d)q 2 a for 
a > 0. 

Theorem 1 Suppose that A holds. 

(i) If n 2 e d — > 0, then N n as n — > oo. 
(ii) If n 2 e d — > a, < a < oo, then \i = lkm^oo fj, n and 

N n —¥ Po(fi) as n — > oo. 

(Hi) If n 2 e d — > oo and ne d — > a, < a < oo, and £i m > when a = oo, then 

(N n - Hn)/(T n ^ N(0, 1) as n ->• oo. 

Note that definition ([3]) implies (j,m > with equality, e.g., if V is uniform. 

Remark 2. The following inference procedure is discussed in Leonenko and Seleznjev (2010) for 
i.i.d. sequences. Let c := \b\{d)q 2 . By applying Theorem \T](ii) to the minimum inter-point distance 
Y n = mini<j<j< n \ \Xi — Xj\\ and e = c^^^^rT 2 ! 6, for a fixed t > 0, i.e., /x n — >• \i = t, we get 

P(cn 2 y„ d > t) = P(F n > e) = P{N n = 0) -»■ e^' = e~* as n -> oo. 

Hence Z n := cn 2 Y d has asymptotically exponential distribution Exp(l) and an asymptotic confi- 
dence interval for the quadratic functional q 2 can be written 

I n = [2c 1 /(n 2 b 1 (d)Y d ),2c 2 /(n 2 b 1 (d)Y d )} 

for certain positive c\,c 2 . 
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2.2 Estimation of the entropy-type characteristics 

We consider an estimator for the quadratic functional 02 based on the normalized statistic Q n = 
(2) -Wm defined as 

Qn = Qn,e ■= Qn/b 6 (d). 

Let H n := — log(max(<5 n , l/ n )) be the corresponding estimator for the entropy /i2- The asymptotic 
behavior of Q n and H n depends on the rate of decreasing for e(n). In the following theorem for 
consistency, we give two versions for different asymptotic rates of e(n), with significantly weaker 
distribution assumptions in (ii). 

Theorem 2 

(i) If A holds and n 2 e d — > 00, then 

P p 

Qn ~> 0.2 and H n — > h,2 as n — )• 00. 

(ii) Let p{x) G L^(R d ) and assume that P(Xi Xj) = 1, for all i ^ j. If ne d — > a, < a < co, 
then 

P p 

Qn ~~ 12 o,nd H n — >■ h>2 as n — > 00. 

Let ij2,e '■= Q2,e/b e (d) and /i2 i<E := — logfe.e) = ^2,e + log(6 e (d)). Next we show asymptotic nor- 
mality properties for the estimators Q n and H n when n and e vary accordingly. Let v := 2q2/bi(d) 
and recall definition (J3J) of Ci,m ■ 

Theorem 3 Suppose that A holds and n 2 e d — > 00. 

(i) If ne d — > a, < a < 00, and £i,m > when a = 00, then 

Vn(Qn~q2,e) ^» N(Q,v/a + 4Ci, m ) and yJnQ n {LI n - h 2 , e ) ^> N(0, u/a + 4Ci, m ) as n 00. 

(ii) If ne d — >0, then 

ne d/2 (Q n - q 2 ,e) 4 N(0, v) and ne d/2 Q n (H n - h 2 , e ) A iV(0, 1/) as n -> 00. 



To evaluate the quadratic functional 02 and the entropy /12 1 we introduce smoothness conditions 
for the marginal density p{x). Denote by Hrf* (K),0 < a < > 0, a linear space of functions 
in satisfying a a-H61der condition in L2-norm with constant K, i.e., if p(x) G {K) and 
/i G then 

\ 1/2 

(p(x + /i) -p(x)) 2 dx <if|/i| a . (4) 



Note that (jU holds, e.g., if for some function g{x) G L 2 (R d ), 

\p(x + h) - p(x)\ < g(x)\h\ a . 

There are different ways to define the density smoothness, e.g., by the conventional or pointwise 
Holder conditions (Leonenko and Seleznjev, 2010, Kallberg et al., 2012) or the Fourier characteri- 
zation (Gine and Nickl, 2008). 

The rate of convergence in probability can now be described in terms of the smoothness of p(x). 
Let L(n),n > 1, be a slowly varying function as n — > 00. 
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Theorem 4 Let A hold and assume that p(x) G Hf'(K). 

(i) Then the bias \EQ n - q 2 \ < \K 2 e 2a + o{l/{ne d ' 2 )) as n — >■ oo. 
(nj 7/0 < a < d/4 and e ~ cn- 2 /( 4a+d ), c> 0, i/ien 

Qn — °2 = Op(n 4< *+ d ) and iT n — hi = Op(n *a+d) as n ^ oo. 

(mj If a > d/4 and e ~ L(n)n~ l l d and ne d — > a,0 < a < oo, i/ien 

Qn — #2 = Op(n _ 2) and H n — h 2 = Op(n~a) as n — > oo. 

Remark 3. Since the aim of this paper is to provide asymptotic properties for estimation under 
dependence, we leave questions regarding efficiency and optimality of the obtained convergence rates 
for further research. Nevertheless, it could be mentioned that, in the independent one-dimensional 
case (m = 0, d = 1), Bickel and Ritov (1988) show that the rates in Theorem 0] are optimal in a 
certain sense (see also Laurent, 1996, Gine and Nickl, 2008). 

In order to make the normal limit results of Theorem [3] practical, e.g., to calculate approximate 
confidence intervals, the asymptotic variances have to be estimated. In particular, we need a 
consistent estimate of the characteristic Ci,m- Assuming that m is exactly known might be too 
strong in our non-parametric setting. However, note that (i /m = Clr f° r r > to, so under the less 
restrictive assumption that a bound r for to is known, we can use a consistent estimator of r . To 
construct this estimator, for h = 0, 1, . . ., consider the following estimator of q^^ := 'Ep{X\)p{Xi + f l ) , 

U h , n = U h ^ eo := M"X(d)- 2 Yl I(d(Xi,Xj) < e , d(X i+h ,X k ) < e ), e > 0, 

(i,j,k)e£ h:U 

where £h,n '■= {(h h k) ■ 1 < i < n — (h + 1), j,k ^ i,i + h, j ^ k} and the number of summands 
Mh, n '■= \£h,n\ = (n — (h + l))(n — 2)(n — 3). Let eo = eo( n ) - > as n — > oo. 

Proposition 2 If A holds and ne^f" — > c, < c < oo, then 

U h ,n ^ qz,h as n ->• oo. 
Remark 4. Under the conditions of Proposition [21 we have 

#3,n := -^log(max(Z7o, n ,l/n)) -7; log d p{xfdx\ = h 3 as n -)■ oo, 

and thus -f^n is a consistent estimator of the cu&ic Renyi entropy /13. 
A consistent plug-in estimator for £i,r can be set up according to 

r 

2l,r,n : = Uo tn — Qn + 2 y)(Uh,n ~ Qn)) 

h=l 

where it is assumed that the sequence eo = £o( n ) satisfies neff- — > c, < c < 00. 

Now we construct asymptotically pivotal quantities by using Theorem [3j the smoothness of the 
marginal density, and variance estimators. To achieve -y/n-rate of convergence, an upper bound 
r > to has to be available. Let w 2 n := 2Q n /(nb € (d)) + 4 max(zi r n , 1/n) be the corresponding 
consistent estimator of u/a + 4(j )7n when ne d — > a, < a < 00. 
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Theorem 5 Let A hold and assume that p(x) G H2 (K),a > d/A, and r > m. If e ~ L(n)n l / d 
and ne d — > a, < a < 00, and C,i )fn > when a = 00, then 

Vn(Qn - q2)/wr,n ->■ iV(0, 1) and y/nQ n (H n - h2)/w r ,n ->■ N(0, 1) as n ^ oo. 

Next we apply Theorem EJ^iiJ to weaken the condition a > d/A and therefore get asymptotic 
normality for less smooth cases. Additionaly, asymptotically pivotal quantities can be built even 
without a bound r available for m. Note, however, that the obtained rate of convergence is slower 
than y/n. Define a consistent estimator for v as u\ := 2m&x(Q n ,l/ n )/bi(d). Let Cp := 0/(2 — 
/3),0 < 0<1. 

Theorem 6 Let A hold and assume that p(x) £ (K) and n 2 e d — > oo. 
(i) If q > (d/±)Cp, for some < /3 < 1, and e ~ cn _ ( 2_/3 )/ d , c > 0, £/ien 

nP /2 c d/2 (Q n - q 2 )/u n 4 iV(0, 1) and n^ 2 c d/2 Q n {H n - h 2 )/u n 4 JV(0, 1) as n -)■ oo. 

^(n)(Qn — Q2)/u n ^ iV(0, 1) and L(n)Q n (H n — h^/un -)■ N(Q, 1) as n ->■ oo. 

The practical applicability of the results in this paper relies on an accurate choice of the parameter 
e. One possibility is to use the cross-validation techniques for choosing the optimal bandwidth for 
density estimation, see, e.g., Hart and Vieu (1990). However, the problem of finding a suitable e is 
a topic for future research. 

In the following examples, let {Zi} be a sequence of i.i.d. iV(0, l)-variables. 

Example 2. We consider estimation of the quadratic Renyi entropy h2(T > ) for the 2-dependent 
moving average MA (2) process X t = 0$Z t + 9\Zt~\ + 02^t-2, where 2 = #1 = Oq = l/y/3. In this 
case V = iV(0, 1) and h2(V) = log(2y / 7r). We simulate N s i m = 500 independent and normalized 
residuals R$ := \/nQ n (H n — h2)/w r>n , i = 1, . . . , iV s j m , with n = 500, r = 6, and e = eo = 1/10. 
The histogram and normal quantile plot in Figure [1] illustrate the performance of the normal 
approximation for B$ implied by Theorem [5j The p- value (0.60) for the Kolmogorov-Smirnov test 
also supports the hypothesis of standard normality for the residuals. 

Example 3. Consider the sequence X t := exp(#o^t + Q\Zt-i), where 6q = y/3/2,9i = —1/2, 
i.e., {Xi} is a 1-dependent log-normal sequence, Xt = exp(Zt). In this case the quadratic entropy 
h 2 (V) = -log(e 1 / 4 /(20F))- Figure[2]shows the accuracy of the normal approximation in Theorem[5] 
for the residuals B$ := y/n(Q n — q2)/w T)n: i = 1,. . . ,N s i m , where n = 500, r = 4, and e = eo = 
3/100. The histogram, normal quantile plot, and p-value (0.36) of the Kolmogorov-Smirnov test 
imply that the normality hypothesis can not be rejected. 

Example 4. Estimation of the quadratic functional q2(V) for the 1-dependent Cauchy sequence 
Xt = Zt/Zt+i- Here V is the Cauchy distribution with q 2 (J- > ) = l/(27r). We simulate residuals 
R® ■= ne d ' 2 {Q n - q 2 )/u n , i = 1,.. . ,N sim , where n = 500, e = 1/100, and N sim = 500. Figure 1 
illustrates the performance of the normal approximation of Rn indicated by Theorem [61 The 
histogram, normal quantile plot, and p-value (0.47) for the Kolmogorov-Smirnov test allow to 
accept the hypothesis of standard normality. 
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Figure 1: A 2-dependent MA(2) time series; sample size n = 500, r = 6, and e = eo = 1/10. 
Standard normal approximation for the normalized residuals; N S i m = 500. 



Log-normal sample Normal quantile plot 




i i i i i i i i i i I r 

-4 -2 2 4 -3 -2 -1 1 2 3 

Normal quantiles 



Figure 2: A 1-dependent log-normal sequence; sample size n = 500, r = 4, and e = eo = 3/100. 
Standard normal approximation for the normalized residuals; N s i m = 500. 
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Cauchy sample 



Normal quantile plot 




Figure 3: A 1-dependent Cauchy sequence; sample size n = 500 and e = 1/100. Standard normal 
approximation for the normalized residuals; N s i m = 500. 

3 Applications 

e-Keys in time series databases 

Let a time series database T be a matrix with n random records (or tuples) rjj(j),j = l,...,n, 
and k attributes, U = {1, . . . , k}, with continuous tuple distribution with density p{x) = pu(x),x £ 
R k . As contrast to conventional static databases, the ordering of records in T is significant, i.e., 
the timestamp j can be associated with an additional attribute for rjj(j). For example, time series 
databases are used for modelling stock market, environmental (e.g., weather) or web usage data 
(see, e.g., Last et al., 2001). Then the database T can be considered as a sample from a vector 
time series {r[/(t)}^ 1 . Assume additionally that {ru(t)} is a stationary m-dependent time series. 
A subset A C U, \A\ = d < k, is called an e-key if N n (A) = 0, i.e., there are no e-close in attributes 
A sub-records rA(j),j = l,...,n. The distribution of N n (A) characterizes the capability of A 
to distinguish records in T and can be used to measure the complexity of a database design for 
further optimization, e.g., for optimal e-key selection or searching dependencies between attributes 
(or association rules) (see, e.g., Thalheim, 2000, Seleznjev and Thalheim, 2008, Leonenko and 
Seleznjev, 2010). Now Theorem ^j[ii) gives an approximation of the probability that A is an e-key, 
P{N n (A) = 0} ~ e~^, where 

\i n = e bi(d) q 2 , e + o(ne 1 ) ~ -abi{d) q 2 as n ->■ oo, a > 0, 

i.e., asymptotically optimal e-key candidates are amongst A, \ A\ = d, sets with minimal value of the 
quadratic functional q 2 and the corresponding estimators of q 2 are applicable with various asymp- 
totics for e and n (Remark 2 and Theorems El [31 El an d [6]) . 
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Entropy maximizing distributions for stationary m- dependent sequences 

Note that conditions of consistency for our estimate of the quadratic Renyi entropy h2(V) (see 
Theorem ^j(ii)) are rather weak and can be easily verified for many statistical models. Hence, 
one can use these consistent estimators to build goodness of fit tests based on the maximum 
entropy principle, see, e.g., Goria et al. (2005) and Leonenko and Seleznjev (2010), where similar 
approaches were proposed for Shannon and Renyi entropies, respectively. Let us remind some 
known facts about the maximum entropy principle, see, e.g., Johnson and Vignat (2007). Consider 
the following maximization problem: given a symmetric positive definite matrix S > 0, for all 
densities p(x) with mean [i and such that 

/ p{x)(x - n)(x - [i) T dx = £, ft := {x € R d : (x - ^^{x - fi) < A + d\ , (5) 
the quadratic Renyi entropy h2(V) is uniquely maximized by the distribution V* with density 

that is for all other densities support Qq, mean fi, and covariance matrix E, see ([5]), we have 
h>2(P) < h,2('P*), with equality if and only if p = p* almost everywhere with respect to the Lebesgue 
measure in R d . The distribution V* belongs to the class of multivariate Pearson type II distributions 
(or Student-r distributions) and its quadratic Renyi entropy 



2r(2 + |) 2 /3 



h 2 (vn = - log - ^— - . (?) 

r(3 + i)?T2 i^i 1 / 2 

For an i.i.d. sample, the goodness of fit test based on the maximum quadratic Renyi entropy princi- 
ple was proposed by Leonenko and Seleznjev (2010). To generalize this test for m-dependent data, 
we need to show that there exists a stationary m-dependent sequence with marginal distribution 
([6]). For one-dimensional processes, one can apply some results from Joe (1997). Henceforth, we use 
some definitions and notation from this book. It is known that for continuous multivariate distri- 
butions, the univariate marginals and the multivariate or dependent structures can be separated by 
copula. Let C(u,v) be a bivariate copula with conditional distribution CV 2 |i) (v\ u) = dC(u,v)/du. 
The inverse conditional distribution is denoted by ( s \ u )- Let F be a continuous univari- 

ate distribution function and let {Ui} be a sequence i.i.d. uniformly distributed U(0, l)-random 
variables. A 1-dependent sequence with marginal distribution F is Y t = h(Ut,Ut+i), where 
h(u,v) = F-^C^ (v\ u)}. The marginal distribution of Yj is F(y), see Joe (1997), p. 253, and the 
joint distribution of (Y t ,Y t+ i) is of the form 

P(Y t < x, Y t+l <y)= f C {C m (F(x)\ u),F(y)) du. 

Jo 

In the same spirit, one can construct a stationary m-dependent sequence with given marginal 
distribution for any m > 1, see again Joe (1997), p. 255. In particular, for any m > 1, there 
exists a one-dimensional stationary m-dependent sequence with marginal distribution F, which has 
density ([6]) for d = 1. For d > 1, the above copula construction seems difficult, but one can use 
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again the results from Johnson and Vignat (2007), reformulated for quadratic entropy as follows: 
if {Zi} is a sequence of i.i.d. iV(0, l)-variables, then 



^ ; — ; = = — f{Zi, ■ ■ ■ , Zd+4), 

\Z\\ 2 + z d+1 + . . . + z d+4 

has density ([B]), where Z := (Z\, . . . , Zd) T ■ Now a stationary m-dependent sequence with the 
marginal density (jSJ) can be defined for all m G {1, ... ,d + 3}, e.g., X t := f(Z t , . . . , Zt+d+3), t > 
1, m = d + 3. Consequently, for any fixed (i and m G {1, . . . , d + 3}, there exists a stationary 
m-dependent vector sequence with marginal maximum quadratic entropy distribution ([6]). Next, 
from we get 

^^i±|>4 (8) 



|s|va fl - 2 r(2 + f)2/?f 



Let /C be a class of d-dimensional density functions p(x), x G R , with support suppjp} = J7o? 
which satisfy condition ([5]). Note that the density p*(x) belongs to this class. Let X±, . . . , X n , n > 2, 
be an m-dependent sample from a member of /C. Consider S n := l/(n— 1) 5^2=iPQ — ^n)C^Q — 
X n ) T , the sample covariance matrix, as a consistent estimate of £ as n — )• 00, where the sample 
mean X n := 1/n Y17=i -^i- The consistency of the sample covariance matrix is a simple consequence 
of the asymptotic properties of stationary m-dependent processes, see, e.g., Lee, (1990), Anderson, 
(1994). Then under the null hypothesis Hq : Xx, . . . X n is a sample from m-dependent data with 
density p*(x), we obtain from the Slutsky theorem and Theorem W(H)i that 



" 1//2 exp j^n j Kd a s n 



00, 



where is defined in and H n is the consistent estimator of the quadratic Renyi entropy. 
Under the alternative H\ : Xi, . . . , X n is a sample from any other member p of fC, we find that 

, ,-i/2 r - 1 P e h2 ^ 

|S n | 7 exp {H n f -> ^ 2 < as n -)■ 00. 

In other words, the above mentioned test is consistent against such alternatives. Note that K\ 
§V5 ~ 3. 727, K 2 = fvr ~ 14.137, #3 = ff W7 ~ 54.304. 



4 Proofs 

The following lemmas are used in the subsequent proofs. 

Lemma 1 (Ch. 2, Lee, 1990) Let k > 2. Then the number of k-tuples of integers 1 < i\ < • • • < 
i k < n that satisfy ij — ij-i > h for j = 2, 3, . . . , k is (™ jT ) ■ 

Lemma 2 (Ch. 5, Billingsley, 1995) If, for each k, zjf ] -4 Z< fc ) as n -> 00, if Z^ 4 Z as 

k —> 00, and if 

lim limsupPflzW - Z n \ > 5) = 0, 

for positive 5, then Z n —> Z. 
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Lemma 3 

(i) Assume that p(x) G L^(R d ) and let px, e (x) '■= b e (d)~ 1 px,e{x). Then, for h = 0, 1, . . ., 
Ep x ,e(X) = q 2 ,e ->■ q 2 and Epx,e{Xi)px,e{Xx+ h ) ->■ Ep(Xi)p(Xi +ft ) as e -> 0, 
and i/iiis b e {d)~ 2 o~ 2 h 6 — >■ Cov(p(Xi),p(Xi + / l )) as e -> 0. 



(nj Tjf .4 is satisfied, then 



(Hi) If A is satisfied, then 



sup P(d(Xi,Xj) < e) = o(e d/2 ) as e -»• 0. 



sup P(d(JT il ,X J - 1 ) < e,d{X i2 ,X j2 ) < e) = o(e d ) as e -»■ 0. 
{ii.is}^{jij2} 

Proof, (i) We use the following result from Lemma 1 in Kallberg and Seleznjev (2012): for random 
vectors X and Y with densities px(x),py{x) G L a+ i(R d ),x G a > 0, 

Epx !£ (y) a -> Ep(Y) a as e -> 0. (9) 

Note that Q immediately implies q 2 ^ e — > q 2 . Furthermore, if Y is defined to have density py(x) := 
p(x) 2 /q 2 ,x G R d , we obtain from ([9]) that 

Ep Xj£ (X 1 )p(X 1 ) = a 2 Ep x , e (y) -> g 2 Ep(Y) = Ep(X0 2 as e -> 0. (10) 

Now consider the decomposition 

Epx,6(Xi)px,e(^i+/ l ) = Ep(Xi)p{X 1+h ) + E(p x , e (Xi) - p(Xi))px, e (X m ) (11) 

+ E((p x ,e(X 1+h ) -p(X l+h ))p{X x ). 

By the stationarity of {X-i} and Holder's inequality, the last two terms in (fTT|) are in absolute value 
bounded by 

{E(pxAXi) ~ piX^ff 2 {EpxAXiff 2 , 

which by Q and (fTUj) tends to zero as e — >■ 0. Hence, Epx ) e(Xi)px,t(Xi + i 1 ) — > Ep{Xi)p{Xi + i l ) and 
the assertion follows. 



(ii) First note that the indices £3 and £4 can be chosen in such a way that Xt 3 and Xt 4 are inde- 
pendent and also independent of {X tl ,X t2 }. For the corresponding density of (X tl , X t21 X ts , X t4 ), 
we have p t (x x ,x 2 , x 3 , x 4 ) = p tlt t 2 (xi, x 2 )p(x 3 )p(x 4 ), where p tlji2 (xi, x 2 ) is the density of (X tl ,X t2 ). 
Consequently, from assumption A, 

gt(x!,X4) G L x (R 2d ), 

9t{x\,x±) := (^j d Pt 1 ,t2{xi,x 2 ) 2 p{x^) 2 p{x^) 2 dx 2 dx^\ = p{xi)ql /2 (J ^Pt^fau x 2 ) 2 dx 2 \ 
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Integrating gt(xi,x^) with respect to X4 gives 

1/2 



' 2 dx 



9t u t 2 {xi) G Li(R d ), g tl ,t 2 (xi) := fj^p tu t 2 (xi,x 2 ) 
Therefore, by Holder's inequality, 

P{d{X tl ,X t2 ) < e) = / p tl>t2 (x,y)dxdy = / / p tlt t 2 (x,y)dy) dx (12) 

J\\x-y\\<e JRd \J\\x~y\\<e J 



V 2 / \ V 2 

\2, 



< / / dy\ / p tut2 (x,y) dy dx 

jRd \J\\x-y\\<e J \J\\x-y\\<e J 

= b € (d) 1/2 [ 4f jt2 (x)dx, 

JR d 



where 



\J\\x-v\\<e ) KJri J 



Since gt lt t 2 {x) G L\(R ) and h\^ t2 {x) — > as e — )• 0, the dominated convergence theorem yields 
Ji? d h^t 2 ( x )d> x -> as e -> 0, and hence from (fT2j) we obtain 

P(c/(X tl ,I t2 )< £ ) = o{e d / 2 ) as e -> 0, (13) 

for each distinct pair {t\,t 2 }- Finally, the stationarity and m-dependence of the sequence {Xj} 
imply that P(d(X tl , X t2 ) < e) attains a finite number of values as ii and £2 vary. Thus, the rate of 
convergence in f)13|) remains valid if we take the supremum over distinct {ti,t 2 } as desired, so the 
statement follows. 



(Hi) Since the argument is similar to that of (ii), we show the main steps only. First assume that 
i\ = i 2 and j\ 7^ j 2 . Under A, it can be shown in a similar way as above that for each 3-tuple of 
distinct positive integers (ti,t 2 ,t 3 ), the density Pt lt t 2 ,t 3 (xi, X2, x 3 ) of (X tl , X t2 , X ts ) satisfies 

9ti,t2,t 3 (xi) £ Li(R d ), gH,t2,t z {x x ) := fj^ p tl ,t 2 ,t 3 (xi, x 2 , x 3 ) 2 dx 2 dx 3 ^j 

Now introduce the random vectors Y{ := (X^^X^) and Yj := (Xj 1 ,Xj 2 ) in R 2d and note that 

P(d(X H ,X n ) < e,d(X h ,X j2 ) < e) < P(\\Yi - Yj\\ 2d < 2e), (14) 

where || • 1 1 2rf is the Euclidean norm in R 2d . Define the coordinate vectors y\ := (xi,xi) and 
y 2 := (x 2 ,xs). By Holder's inequality, 

P(\\Yi - Yj\\ 2d < 2e) = / p iujl j 2 (xi,x 2 ,x 3 )dxidx 2 dx 3 (15) 

J\\vi -y2\\ad< 2e 

< 6 2e (2d) 1 /2 f h\% ij2 (x 1 )dx 1 , 



14 



where ^ 

V^ll2/i-»2ll2d<2e / 

By the dominated convergence theorem, J Rd j 2 {x\)dx\ — > as e — > and therefore, since 
6 2e (2d) 1 / 2 = C d e d , C d > 0, we get from ([H]) and jsj that 

P(d(X it , X jx ) < e, d(X J2 , X i2 ) < e) = o(e d ) as e ->• 0, (16) 

where ii = 12 and ji 7^ j2- By a similar argument, this also valid when i\ 7^ 12 and ji 7^ j'2. Finally, 
from the stationarity and m-dependence of {^Q}, the rate of convergence in f)16|) still holds if we 
take the supremum over distinct pairs of pairs. This completes the proof. □ 



Proof of Proposition^ (i) Define the index set X = X(n, m) := {(i,j) : 1 < % < j < n,j —%> m} 
and the reduced form of N n : 

Since EI(d(Xi, Xj) < e) = (72,6 when G X, Lemmas Q] and [3] yield the claim for the expectation: 

m f — \ 

EN n = EN* + Y(n-h)P(d(X 1 ,X 1+h )<e)= ( U ™ ) g 2 , e + o{ne d ' 2 ) (17) 

/i=i 



92,e + o{ne d l 2 ) as n — )• 00. 



For the variance of N n , we first study the variance of N*. To this end, we need to consider 

| X |2 = (n-rny q{ ^ form 

Cov(I(d(X Sl ,X S2 ) < e),I(d(X tl ,X t2 ) < e)), (18) 

where (s\, s 2 ), (ii, f 2 ) G X. To count the various types of such terms, we use results from Ch. 2.4.1, 
Theorem 1, in Lee (1990). It should be noted that these results are stated for [/-statistics based 
on random variables, rather than random vectors, but the argument is merely combinatorial and 
thus also valid here. 

1) When tj\ > m, i,j = 1, 2, the random variables in ()18j) are independent and the covariance 
is zero. 

2) Only two random variables are involved, i.e., si = t\ and S2 = £2, so the covariance is 
Var(I(d(X Sl , X s , 2 ) < e)) = (/2,<e ~ Q2 e = 92,e + 0(e 2d ) as e — > 0. The number of such terms is 

|x| = (v)- 



3) Exactly one of the four possible differences — tA is zero and the rest are greater than m. By 
conditioning, the covariance is Cov(I(d(X Sl , X S2 ) < e), I{d{X Sl , X t2 ) < e)) = cr 2 0e . There 

-2) 

3 



are 6( n s 2m ) terms of this type. 



4) For h = 1, ... ,m, < |sj — tj\ = h < m for one of the differences |sj — tj\ and the others 
are greater than m. Then the covariance is Cov(I(d(X Sl , X S2 ) < e), I(d(X Sl+ h, Xt 2 ) < e)) = 
o~\ he . The number of terms of this type is 12( n ~ 2 ™~ /l ). 
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5) The number of the remaining terms is 0(n 2 ), so Lemma ^iii) indicates that their sum is 
o(n 2 e d ) as n — > oo. 

From the above and Lemma [31 we obtain 

Var(iV*) = E Cov(I(d(X Sl ,X S2 )<e),I(d(X tl ,X t2 )<e)) (19) 

(si,s 2 )e2 (ti,t 2 )ez 

n — m\ (n — 2m\ , / n — 2m — /i » > > 

by.-, , 4- (il )rrr„ 4- \ T> I I cr 1 ' - •-" 

2 



\ (n — 2m\ 9 \ - fn — 2m ,. , , ., , 

J'/2.- • 3 J4„, • E l2 ( 3 K/„- +<>(»-< > 

92,e + n 3 I cr 2 0e + 2 ^ of fte J + o(nV) as n ^ oo. 



\ h=l / 

Next it is verified that Var(iV*) and Var(iV n ) are close. Note that 

Var(Ay = (n - l)Ci,„ + . . . + (n - m)C7 m , n + Cov(iV n , AT*), (20) 
where := Cov(A r n , I{d{X\, X\ + h) < e)), /i = 1, . . . , m. We have 
C h ,n= Yl Cov(I(d(X i ,X j )<e),I(d(X 1 ,X 1+h )<e) 

l<i<j<n 

= Vav{I(d(X 1 ,X 1+h ) <e))+ ^ Cov^X;,^) < e), I(d(X u X l+h ) < e)), 

l<i<j<n 

where the number of pairs in the last sum for which I(d(Xi,Xj) < e) and I(d(Xi, Xi+h) < e) 
are not independent is O(n). From Lemma El we get Ch, n = o{e d / 2 ) + o(ne d ) as n — > oo, and 
consequently (j2"Ul) leads to 

Var(iV n ) = Cov(iV n , iV*) + o{ne d/2 ) + o(nV) as n ^ oo. (21) 

Furthermore, observe that 

Var(iV n ) = Var(iV*) + Var(iV n - N*) + 2Cov(iV*, N n - N*) 

and hence ([2T]) gives 

Var(AT n ) = Var(7V*) - Var(iV n - N*) + o(ne d/2 ) + o(nV) as n ->■ oo. (22) 
The next step is to show that 

Var(iV n - AT*) = o(ne d/2 ) + o{n 2 e d ) as n -> oo. (23) 
To this end, first note that Lemma E^ziJ implies 

E(iV n - AT*) = EI ( d ( x i, Xj) < e) = o(ne d/2 ) as n ^ oo, (24) 

l<i<j'<n 
j—i<m 

since the number of terms in this sum is O(n), so we only need to prove that 

E{N n - N*) 2 = o{ne d / 2 ) + o(n 2 e d ) as n ^ oo. (25) 
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We have 

E(N n -N*) 2 = Yl E E/( ( i(^,X,)<e)/( ( i(X i ,,X,v)< e ) 



l<i<j<nx<i'<j'<n 
j-i<m j'-i'<m 



There are 0(n 2 ) terms in this double sum. Moreover, O(n) of these have the form P(d(Xi, Xj) < e) 
and the remaining are of the type P{d{X,i 1 ,Xj 1 ) < e,d(Xi 2 ,Xj 2 ) < e), where {h,ji} 7^ {^2, J2}, so 
follows from Lemma 3. Therefore (|23p holds true, and combining with (|22p and (|19|) . we get 



Var(iV n ) = ^-q 2 ^ + n 3 ( a{^ t + 2 ^ <?lh,e I + o(ne d/2 ) + o(nV) as n -»• 00. (26) 



2 

The assertion is proved 



h=l 



(ii) If Ci m > 0, the claim follows directly from Lemma [3] and (|26p . When C,i,m = 0, we have 
sup n>1 {ne d } < 00 by assumption. Hence, from (j26|) and the condition n 2 e d — > a, < a < 00, 

1 , V r' (iVn) 9 , = 1 + o(ne d ) + o(l/(ne d / 2 )) + o(l) -> 1 as n -> 00. 
^b\[d)q 2 n l e a 

This completes the proof. □ 



Proof of TheoremUl (i) From the assumption n e —> and Proposition []](%), we get /%, cr 2 — > 0. 
p 

Consequently, iV n — > and the assertion follows. 

(ii) By the condition n 2 e d — > a, < a < 00, we have ne d — > and thus Proposition [!](%) and 
Lemma[2](%) yield ^ n — » /z. Moreover, since (J2SD and n 2 e d — > a, < a < 00, imply E(iV*- iV„) 2 0, 
it is enough to verify the Poisson convergence for N*. For this we apply the Stein-Chen Poisson 
approximation method (Barbour et al., 1992). To measure the deviation between two probability 
distributions V\ and V 2 of integer-valued random variables, we use the total variation distance 

dTv(Vi,V 2 ) ■= sup{|P!(C) - V 2 (C)\ :CC{0,1,...}} = ^ \V x (i) - V 2 [i)\. 

i>0 

Recall the definition Z := '-l<i<j<n,j — i> m} and let :=Z \ Further, 

using the notation I t (i,j) '■= I(d(Xi, Xj) < e), we define 

^ij) '■= 6 '■ Ie(k,l) and I e (i,j) are dependent}. 

Since EI € (i,j) = q 2j€ for all (i,j) G I, Corollary 2.C.5 in Barbour et al. (1992) yields 

d TV (C(N*),Po^* n )) (27) 

^(e^+E E E E E/ e (*-,i)i e (M)l 



where C(N*) denotes the distribution of N* and \x* n := EiV*. The m-dependence of {X{\ gives that 

ro 

■ ) |<n(4m + 2) 



the number of terms in I?- ■■ ] , for each (i,j) G X, has the bound 
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so for the first two sums in (|27p . Lemma [3] and the condition n 2 e d — > a,0 < a < oo, lead to 

J2 q h ^ q h -\ 2 ) n(4m + 2) ^' e ~ (2m + l ) h ^ d ) 2( & a2n ~ l as n ^ oo. 

(ij)ex (i,i)eX(fc,/)ex { o j} V 7 

(28) 

For the last sum in (|27p . we see that the various types of terms can be counted in a similar 
way as the covariances in (fK?|) . With the notation q^h,e '■= ^Px,e(Xi)px,e(Xi + h), it follows from 
n 2 e d — > a, < a < oo, and Lemma[3](%) that 

£ £ E/ £ ( l , J U(fc,/)=6( n - 2m )^ + El2( n " 2 3 m "^3Ae + o(nV) (29) 

= 6i(d) 2 a 2 ^Ep(Xi) 2 + 2^Ep(Xi)p(X m )^ tiT 1 + o^" 1 ) + o(l) -> as n -> oo. 
Note also that 

fi* n = ( 2 j 52,6 ->/iasn->oo. (30) 



By combining (|27l) . (I28|) . (|29|) . and pOj) . 

d T v(£(K),Po{fi* n )) -»• as n -»• oo. (31) 
Finally, if Z n ~ Po(^*) and Z ~ Po(/j,), then p0|) yields Z n 4 Z, so from (f3T|) we obtain 

P(iV* = k) - P{Z = k) = P(N* = k) — P{Z n = k) + P{Z n =k) — P{Z = k) -)• as n -> oo, 
for each fc = 0, 1, . . ., and thus iV* 4 Z. The statement follows. 

(ra,) The idea of the proof is to apply Lemma [2] to the reduced form N* of N n . In fact, by 
(|24p . ()25p . and Proposition [TJ^iiJ, for the last two terms in the decomposition 

N n - EN n _ N* - EN* E(N* - N n ) N n — N* 

&n &n &n &n 

we have E(iV* — N n )/a n — > and (N n — N*)/a n — > (in quadratic mean). Consequently, it is 
enough to show 

TV* - EN* n 

Z n := -2 a 4 JV(0, 1) as n -> oo. (32) 

The weak convergence (|32[) follows from Lemma[2] if there exist successive approximations {Zn' ! ^}^. 1 
of Z n such that 

cl) Z^ k) 4 Z« as n — > oo and 4 iV(0, 1) as — > oo. 
c2) For every 5 > 0, 

lim limsupP(|Z n - Z^| > <5) = 0. 
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In order to construct such Zn\ for each A; = 1,2,..., define an integer s = s(k, n) := [n/(k + m)] 
and consider the set of /c-subsets of {1, . . . , n} 

S\ k) := {(i — l)(k + m) + 1, . . . , (i — l)(k + m) + k}, i = 1, . . . , s. 

Now, based on the subset 1^ := : i G S^J G S { t k) , 1 < Z < t < s} of 1, let 

n (M')ez (fc) 
First we prove c2). Denote by M2, M3, and {M 4 /j}™ =1 , the numbers of terms of types 2, 3, and 
4, respectively, defined in Proposition []](%). Furthermore, let and {M^j^Li be the 

numbers of these terms that also appear in 

Var(C/( fc ))= E Cov(I(d(X Sl ,X S2 )<e),I(d(X tl ,X t2 )<e)). 

(si,s 2 )ex( fe ) (ti,t 2 )ex( fe ) 

bmce we observe that 

Var(i\£ - U<p) < E \Cov(I(d(X Sl ,X S2 )<e),I(d(X tl ,X t2 )<e))\ (33) 

(si,s 2 )er {ti,t 2 )ex 

E E |Cov(J(d(X Sl ,X S2 )<e),J(d(X tl ,X t2 )<e))| 
(si,s 2 )GX( fc ) (ti,t 2 )ezW 

m 

= (M 2 - )g 2j£ + (M 3 - Mf )a? i0i£ + E( M 4,h - M$)|a? ihie | 
+ o(n 2 e d ) as n — > 00. 

First, for h = 1, . . . , m, we obtain a lower bound for h . Assume without loss of generality that 
k > 2m + l. There are ( 2 )(fc — 2h) 2 elements (si,s 2 ) inZ( fc ) that also satisfy (si + s 2 + <i 2 ) el'*', 
where di,d 2 = ±/i. For every element of this type, an index t\ with \s\ — t\\ = h or |s 2 — t\\ = h 
can be chosen in 4 different ways. Moreover, for each such alternative we can choose t 2 in at least 
(s — 2)k different ways. We get that is bounded from below by 

Mfl := ( S \k- 2h) 2 A(s - 2)k ~ 2 ^ ~ as n ->• 00. (34) 



(35) 



Further, by Proposition [T](m,), Lemma E](%>, and the limits M^h ~ 2n 3 and (|3"4"|) . we get 



c 2 cr 2 n 3 e M 



/ (k — 2h) 2 k\ 
-> C 4 ,h ( 1 - ^ fc + m ^ 3 ) |Cov(p(Xi),p(X m ))| as n -> 00, 

— ffc) _ (k) (k) 

for some < C^h < 00. By a similar argument, we have lower bounds M 2 and M3 for M 2 

(k) 

and M 3 , respectively, such that, for some < C 2 < and < C3 < 00, 
(M 2 -M 2 (fc V £ nV (M 2 -M 2 (fc) )g 2 , e / fc 2 \ 

^ ^" > Ca V " otT^Fj 92 ' (36) 

(M 3 - M? ))of n 3 £ 2 d (Ms _ Mf )a? / fc3 X 

— >■ 63 I 1 — — yT Var(p(Ai)J as n — > 00. 



cr 2 a 2 n 3 e 2rf \ (k + m 
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Now, from {33}, QM, (EH, and Proposition ^ii), 

(AT* _ Tjfo' \ 
-n — = o 

(k) 

and hence, zero mean, c2) is implied by Chebyshev's inequality. 

Next we prove that cl). Let 

(Ik 2 k 3 (fc) 



V(k) := { 



2(k + m r q2 + WTm^ blid)Cl ' ma n 
— J -zr i 1 j o < a < oo, 

-q 2 + 6i(d)Cl,mO 

k 3 Cl m 

a = oo, 



. (A; + m) 3 Ci, m : 



where 



([% := Var(p(*i)) + 2^(1 - h/k)Cov(p(X l ),p(X 1+h )). 

h=l 

Note that, since — > Ci,m as k — > oo, we have V(k) — >• 1 as k — > oo, so cl) follows if it can be 
verified that 

-> iV(0, V(Jfe)) as n -> oo. (37) 

To prove this, we apply the corresponding result of Jammalamadaka and Janson (1986) for inde- 
pendent samples. In fact, if we introduce the pooled random vectors in R kd 

Yi '■= (^(i-l)(fc+ m )+l, • • • , -X"(i-l)(fc+m)+fc)) i = 1, . . . j S, 

then the m-dependence of {Xi] implies that {y} is an independent sequence in R kd . Thus Un 
can be represented as a [/-statistic with respect to the independent sample Y\,...,Y S , 

l<i<j< S ; e o(fe) t£ o(fe) 

• J 

Furthermore, let gi k) (Yi) := E(/,(Yi, Y 2 )|Yi) = fcE?=iPx, 6 (-Xi) and define 

i£ fc := i S 2 Var(/«(y 1 ,y 2 )) + fVarigW&i)). (38) 
Since n 2 e rf — > oo yields s 2 e rf — > oo, we obtain from Lemma [3] that 

nl,k > ^ 2 Var(/i fc )(y, y 2 )) ~ ^(tQg^aAOV -> oo as n -> oo, (39) 

and therefore 

sup |/i fc) (2/i,yj)| = A: 2 = o(r/ Sjfc ) as n oo. (40) 

Moreover, Jammalamadaka and Janson (1986) show that sup a . pxe( x ) = o(e rf / 2 ) as e — > 0, and 
hence the stationarity of {Aj} and (|39p give 

supE|/ s (fc) (y,y!)| < k^2suY>EI(d(x,X l ) < e) = A: 2 suppx,e(z) = o{e d/2 ) = o(r] s>k /s) as n -> oo. 
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Since n — > oo implies s — > oo, we get from (|40p and (|4ip that the conditions of Theorem 2.1 in 
Jammalamadaka and Janson (1986) are satisfied. Consequently 

Further, using Proposition U^ii) and definition ()38|) . it is straightforward to show 

^fc/^n ->■ V(*0 as n ^ oo, 

so the desired limit (f3T|) of Z„ follows from ([IT]) and the Slutsky theorem. This completes the 
proof. □ 



Proof of Theorem^ (i) From LemmaEJ Proposition [T](%), and the condition n?e d — > oo, 

EQ n = g 2 , e + o (-^72) -> ? 2 ' ( 42 ) 
Var(Q„) = (») VV. = O (^) + O (£) + o (-^) + o 
O ( -=-t ) + 0( — I — 7-Oasn— t-oo. 



n 2 e d / \n 
Hence, the assertion follows. 

(ii) In order to avoid condition A., we repeat the argument of Proposition [!](%) with the convergence 
rates in Lemma ^(ii)-(iii) replaced by the weaker limits 

sup P(d(X h , X h ) < e, d(X i2 , X j2 ) < e) < sup P(d(JQ , Xj) < e) -»• as e -»• 0, (43) 

which follow from the stationarity and m-dependence of {X{\ and since P(-X"j = Xj) = 0. First we 
obtain from (|17p that 

= f "1 9V + °( n ) as n 00. (44) 

Moreover, if we use (|43p in place of Lemma [3] in the derivation of (|19p . (|22p . (|23p . and thus finally 
([2SJ, it follows that 

2 / m \ 

Vax(iV*) = ^-q 2 ,e + n 3 + +o(n 2 ), (45) 



/i=i 

,2a 



2 

Var(AT n ) = Vax(JV*) - Var( X„ - Y„ ) + o(,r ). 
Vax(iV n - JV*) =o(n 2 ), 



7? 2 



°n = y <?2, e + n I 0i,o,e + 2 2^ ^i,h, e I + o(n ), as n -»■ 00. 
From (|44p . the last statement in (|45|) . and the condition ne d — > a, < a < 00, 

Var(Q n ) = Q 6 £ (d)-V 2 = O f-i^ + O (~) + o (-^\ as n -+ 00, 
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Q2 e) = o ( ) — > as n — > oo. (47) 

\Vne d 



(48) 



so the claim holds true. This completes the proof. □ 
Proof of Theorem (%) Let 

V^(Qn-q2,e) = V^(^) 6 £ (d)" 1 (iVn-EiV n )+i? n , (46) 

where, by Proposition [T^zJ and the assumption ne d — > a, < a < oo, 

Furthermore, from Proposition ^ii), 

("^(2) &e ^) -1 l ^n^^ + ^i,™ asn->oo. 

Finally, combining (|46p . (|47p . (|48p . Theorem [T^m,), and the Slutsky theorem gives the assertion for 
Q n . The statement about H n follows from Proposition 2 in Leonenko and Seleznjev (2010). 

(ii) The details are omitted, since the argument is similar to that of (i), using the decomposition 
corresponding to (|46|) with the ne d / 2 -scaling. This completes the proof. □ 



Proof of Theorem^ (i) As in Leonenko and Seleznjev (2010), the density smoothness condition 
yields 

\h,e-<l2\<\K 2 e 2a . (49) 
This bound and Proposition [!](%) imply the assertion 

|EQ n - % I < \q 2 ,e ~ g 2 1 + |EQn - Q2,e\ < \k 2 e 2a + o(l/(ne d / 2 )) as n -> 00. 

(ii) Note that, by the assumptions e ~ cn _2 /( 4a + d ) an d < a < d/4, 

n 2 e d _ c d n 8a/(4a+d) < c ^ n as n ^ ^ ( 50 ) 

and hence, from (1421). 



Var(Q n ) = o(n- 8Q /( 4Q+d )) asn^co. 
Further, since e 2a ~ c 2a n -4a/(4a+d) ^ we f rom ^ anc j (jgQj) that the bias fulfills 

\EQ n - Q2 1 = 0(n" 4Q/(4Q+d) ) asn^oo. 
Consequently, for some C > and any A > 0, 

P(|Q n " 921 > yln" 4 ^ 4 ^) < n 8 a /(4a + d) Var(Q TO ) + (EQ n - g2 ) 2 ^ C 

and the desired convergence for Q n follows. Moreover, combining this with Proposition 2 in Leo- 
nenko and Seleznjev (2010) proves the statement for H n . 
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(Hi) The argument is similar to that of (ii) and therefore is left out. This completes the proof. □ 



Proof of Proposition^ First we study the expectation of Uh,n- By Lemma [H the number of 3- 
tuples (si, S2, S3) that satisfy Sj+i — Sj > h + m, i = 1, 2, is ( n_2 (™ +/l )) _ Furthermore, observe that 
3! of the elements (i,j,k) £ £h,n are permutations of (si,S2,S3). For the corresponding variables, 
Xj and Xj. are mutually independent and also independent of {Xi,Xi + h}, so for these we obtain 

EI(d(Xi,Xj) < e ,d(X i+h ,X k ) < e ) = Ep x ,e ( x i)Px,e ( x i+h) = F>PX,e ( x l)px,e ( x i+h)- 
Thus, from Lemma [3] and the assumption ne$ — > 00, 

VU h>n = 3! f " " 2( ™ + 2/l) ) M^b eo (d)- 2 Fp x , eo (X 1 )p X)eo (X 1+h ) + o(l/(neg)) (51) 
->• Ep(X 1 )p(X 1+h ) = q 3>h as n -)■ 00. 

Next we consider the variance of C//j jn . Using the notation I tQ (i,j) := I(d(Xi, Xj) < eo), we have 

Var(C/ hin ) = M"X(d)- 4 ^ Cov (J eo (si, s 2 )/ eo (si + /i, s 3 ), ^(ti, t2)/ eo (*l + h, t 3 )) ■ (52) 

(ti,t 2 ,t 3 )es h 

,n 

We count the number of terms in this sum that are zero. Lemma Q] implies that the number of 
6-tuples {ui, . . . , u^} Q {1, . . . , n} with Uj+i — U{ > m + h,i = 1, . . . , 5, is ( n ~ 5 (^+ fe )^ _ Each such 
6-tuple can be divided and permuted into ( 3 ) x 3! x 3! = 6! pairs (si,S2,ss), (^1,^2,^3) £ £h,n- The 
m-dependence of {X{\ yields that the corresponding random variables are independent, and hence 
at least 

n-5(m + h)\ _ ^ 5 



6! ^ v g ' j = M£ n + 0(n & ) as n -> 00 
summands in (|52p are zero. For each of the 0(n 5 ) non-zero terms, 

|Cov(I eo (si,S 2 )Ieo(si + /i, S3), Ie (h,t2)Ie (h + Ms))| 

< (E7 eo (si, s 2 )Ie (si + /», s 3 )) 1/2 (E/ eo (ti , t 2 )h (h + fc, t 3 )) 1/2 , 

so Lemma \3Kiii) gives that the sum of the non-zero terms in (|52p is o(n 5 6Q) as n — > 00. Combining 
this with the condition nejj rf — > c, < c < 00, we get 

Var(E/j, )Tl ) = o ( — ^ J — >• as n — )• 00. (53) 
\ ne o / 

Finally, from (f5"T|) and ([53]) it follows that E(Uh, n — Q3,h) 2 — > 0, which completes the proof. □ 



Proof of Theorem 0. The argument is similar to that of Theorem 6 in Leonenko and Seleznjev 
(2010), so we show the main steps only. From the decomposition 

y/n(Q n - qz) = Vn(Qn - h,e) + v^fee - 92), (54) 
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we see that the assertion for Q n is implied by the Slutsky theorem if ^yn(Q n —q 2jt ) — > N(Q, v/a+Ci,m) 
and \y/n{q2,e — q%)\ 0. The asymptotic normality follows straight away from Theorem [3](%). Fur- 
thermore, the conditions for a and e together with bound (|49p lead to the desired convergence of 
\y/n(q~2,e ~ Q2) I- Finally, Proposition 2 in Leonenko and Seleznjev (2010) proves the claim for H n . 
This completes the proof. □ 



Proof of Theorem® (i) We use the decomposition corresponding to (|54j) : 

n^c d l\Q n - q 2 ) = n^ 2 c d / 2 (Q n - q 2 ,e) + n^c d / 2 (q 2 , e - q 2 ). (55) 

Note that the condition e ~ cn~ <y2 ~^^ d gives ne d —> and n^l 2 c d l 2 /(ne d ^ 2 ) —> 1, so the asymptotic 
normality 

nP /2 c d/2 {Q n - q 2 ,e) 4 N(0, v) as n -»• 00 (56) 

follows from Theorem \3$(ii) and the Slutsky theorem. Further, the assumptions a > (d/4)Cp, 
e ~ cn~^ 2 ~^l d ', and bound (09]) imply 

|n^V/ 2 (&, 6 - g 2 )| < c d / 2 ^ 2 n^ 2 e 2a ~ c d / 2+2a ljp n N*-*>®-m _^ as n -> 00, (57) 

since 0/2 - 2a(2 - /3)/d < /3/2 - 2(d/4)C jS (2 - /3)/d = 0. Thus, from {55}, {56]), {SID, and the 
Slutsky theorem, we obtain the statement for Q n . The assertion for H n follows by an argument 
similar to that of Proposition 2 in Leonenko and Seleznjev (2010). 

(ii) The argument follows the same steps as that of (i) and consequently is omitted. This completes 
the proof. □ 
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5 Appendix. Estimation for discrete distributions 

Consider a stationary m-dependent sequence {X\, . . . ,X n } with discrete <i-dimensional (marginal) 
distribution V = {p(k),k 6 N d }. We present some results on the estimation of quadratic Renyi 
entropy for discrete distributions 



623-656. 





and the corresponding quadratic functional 



q 2 :=Y^p(k) 2 = P(X = Y) 



k 
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where X and Y are independent vectors with distribution V . Similarly to the continuous case, let 

in 

Ci, m ~ Var(p(X a )) + 2 £ Cc-v^Xx),^^)). 



Define the normalized statistic 



i<i<i<n 



n,0> 



to be an estimator for q 2 . Let H n := — log(max(Q n , 1/n)) be the corresponding estimator for h%. 
For h = 0, . . . , m, we also introduce the following estimator for q^^ := Ep(Xi)p(Xi + i % ) , 

(,i,j,k)e£h,n 

where £ n /j and M n /j are defined as in Section [21 By an argument similar to that of Proposition [21 
P 

we get Uh, n —> Q3,h- Hence, for r > m, a consistent estimator for £i )Tn is given by 

r 

s r,n : = ^0,n ~ Qn + 2^(£4,n - Q^). 

i=l 

Some asymptotic properties for the estimators of q 2 and /12 follow by combining the results of Ch. 2 
in Lee (1990), Wang (1999), and the Slutsky theorem. 

Theorem 7 

(i) For the expectation and variance, we obtain 

E(Q n )=q 2 + 0(n~ 1 ), 
Vax(Q n ) = 4£i jm n -1 + 0(n~ 2 ) as n — > 00, 

and i/i7is Q n and H n are consistent estimators for q 2 and h 2 , respectively. 

(ii) If Ci,m > and r >m, then 



V^(Q n -g2)4iV(0,4Ci, m ) and (Q„ - q 2 ) 4 JV(0, 1); 

Q n (iJ„ - h 2 ) -> iV(0, 1) as 00. 



2s rri 

As in the continuous case, we have £i,m > with equality, e.g., if V is uniform. 
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